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Many data sources report related variables of interest that are 
also referenced over geographic regions and time; however, there are 
relatively few general statistical methods that one can readily use that 
incorporate these multivariate spatio-temporal dependencies. Addi¬ 
tionally, many multivariate spatio-temporal areal data sets are ex¬ 
tremely high dimensional, which leads to practical issues when formu¬ 
lating statistical models. For example, we analyze Quarterly Work¬ 
force Indicators (QWI) published by the US Census Bureau’s Longi¬ 
tudinal Employer-Household Dynamics (LEHD) program. QWIs are 
available by different variables, regions, and time points, resulting 
in millions of tabulations. Despite their already expansive coverage, 
by adopting a fully Bayesian framework, the scope of the QWIs can 
be extended to provide estimates of missing values along with asso¬ 
ciated measures of uncertainty. Motivated by the LEHD, and other 
applications in federal statistics, we introduce the multivariate spatio- 
temporal mixed effects model (MSTM), which can be used to ef¬ 
ficiently model high-dimensional multivariate spatio-temporal areal 
data sets. The proposed MSTM extends the notion of Moran’s I ba¬ 
sis functions to the multivariate spatio-temporal setting. This ex¬ 
tension leads to several methodological contributions, including ex¬ 
tremely effective dimension reduction, a dynamic linear model for 
multivariate spatio-temporal areal processes, and the reduction of a 
high-dimensional parameter space using a novel parameter model. 
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1. Introduction. Ongoing data collection from the private sector along 
with federal, state, and local governments have produced massive quanti¬ 
ties of data measured over geographic regions (areal data) and time. This 
unprecedented volume of spatio-temporal data contains a wide range of vari¬ 
ables and, thus, has created unique challenges and opportunities for those 
practitioners seeking to capitalize on their full utility. For example, method¬ 
ological issues arise because these data exhibit complex multivariate spatio- 
temporal covariances that may involve nonstationarity and interactions be¬ 
tween different variables, regions, and times. Additionally, the fact that these 
data (with complex dependencies) are often extremely high-dimensional (so 
called “big data”) leads to the important practical issue associated with 
computation. 

As an example, the US Census Bureau’s Longitudinal Employer-Household 
Dynamics (LEHD) program produces estimates of US labor force variables 
called Quarterly Workforce Indicators (QWIs). The QWIs are derived from 
a combination of administrative records and data from federal and state 
agencies [Abowd et al. (2009)]. The sheer amount of QWIs available is un¬ 
precedented, and has made it possible to investigate local (in space-time) 
dynamics of several variables important to the US economy. For example, 
the average monthly income QWI is estimated quarterly over multiple re¬ 
gions and industries (e.g., education, manufacturing, etc.). In total, there 
are 7,530,037 quarterly estimates of average monthly income. 

The QWIs present interesting methodological challenges. In particular, 
not every state signs a new Memorandum of Understanding (MOU) each 
year and, hence, QWIs are not provided for these states [Abowd et al. (2009), 
Section 5.5.1]. Furthermore, some data are suppressed at certain regions and 
time points due to disclosure limitations [Abowd et al. (2009), Section 5.6]. 
Another limitation is that uncertainty measures are not made publicly avail¬ 
able. Consequently, it is difficult for QWI data users to assess the quality of 
the published estimates. Thus, producing a complete set of estimates (i.e., 
national coverage) that have associated measures of uncertainty is extremely 
important and provides an unprecedented tool for the LEHD-user commu¬ 
nity. As such, we take a fully Bayesian approach to estimating quarterly 
measures of average monthly income and, thus, provide a complete set of 
estimates that have associated measures of uncertainty. 

A fully Bayesian model that can efficiently and jointly model a correlated 
(over multiple variables, regions, and times) data set of this size (7.5 x 10 6 ) 
is unprecedented. It is instructive to compare the dimensionality of the QWI 
to data sets used in spatial analyses in other scientific domains. For example, 
Banerjee et al. (2008) use a fully Bayesian approach to analyze a multivariate 
spatial agricultural data set consisting of 40,500 observations; Cressie and 
Johannesson (2008) use an empirical Bayesian approach to analyze a spatial 
data set of total column ozone with 173,405 observations; Lindgren, Rue and 
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Lindstrom (2011) use a fully Bayesian approach to analyze climate spatially 
using approximately 32,000 observations; and Sengupta et al. (2012) use an 
empirical Bayesian approach to analyze cloud fractions using a data set of 
size 2,748,620. Furthermore, none of these methods allow for multivariate 
dependencies between different geographic regions and time points. 

Despite the wide availability of high-dimensional areal data sets exhibit¬ 
ing multivariate spatio-temporal dependencies, the literature on modeling 
multivariate spatio-temporal areal processes is relatively recent by compari¬ 
son. For example, various multivariate space-time conditional autoregressive 
(CAR) models have been proposed by Carlin and Banerjee (2003), Congdon 
(2002), Pettitt, Weir and Hart (2002), Zhu, Eickhoff and Yan (2005), Daniels, 
Zhou and Zou (2006), and Tzala and Best (2008), among others. However, 
these methodologies cannot efficiently model high-dimensional data sets. Ad¬ 
ditionally, these approaches impose separability and various independence 
assumptions, which are not appropriate for many settings, as these mod¬ 
els fail to capture important interactions and dependencies between differ¬ 
ent variables, regions, and times [Stein (2005)]. Hence, we introduce the 
multivariate spatio-temporal mixed effects model (MSTM) to analyze high- 
dimensional multivariate data sets that vary over different geographic re¬ 
gions and time points. 

The MSTM is built upon the first order linear dynamic spatio-temporal 
model (DSTM) [Cressie and Wikle (2011)]. To date, no DSTM has been 
proposed to analyze multivariate high-dimensional areal data, and, as a re¬ 
sult, the components of the MSTM require significant methodological de¬ 
velopment. Specifically, we introduce novel classes of multivariate spatio- 
temporal basis functions, propagator matrices, and parameter models to be 
used within the MSTM. 

The components of the MSTM can be specified to have a computationally 
advantageous reduced rank structure [e.g., see Wikle (2010)], which allows 
us to analyze high-dimensional areal data (e.g., QWIs from the LEHD pro¬ 
gram). This reduced rank structure is achieved, in part, by extending var¬ 
ious aspects of the model suggested by Hughes and Haran (2013) from the 
univariate spatial-only setting to the multivariate spatio-temporal setting. 
Specifically, we extend the Moran’s I (MI) basis functions to the multivariate 
spatio-temporal setting [for the spatial-only case see Griffith (2000, 2002, 
2004), Griffith and Tiefelsdorf (2007), Hughes and Haran (2013), Porter, 
Wikle and Holan (2015)]. Further, we propose a novel propagator (or tran¬ 
sition) matrix for the first-order vector autoregressive—VAR(l)—model, 
which we call the MI propagator matrix. In this context, the propagator 
matrix of the VAR(l) model is specified to have a desirable nonconfounding 
property, which is similar to the specification of the multivariate spatio- 
temporal MI basis functions. 
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We also propose an extension of the spatial random effects covariance 
parameter model used in Hughes and Haran (2013) and Porter, Holan and 
Wikle (2015), which we call the MI prior. Here, we interpret the MI prior as 
a rescaling of the covariance matrix that is specified to be close (in Frobe- 
nius norm) to a “target precision” matrix. This parameterization signifi¬ 
cantly reduces the dimensionality of the parameter space, thereby reduc¬ 
ing the computational burden associated with fully Bayesian inference in 
high-dimensional spatio-temporal settings. Furthermore, this target preci¬ 
sion matrix can be sensibly chosen based on knowledge of the underlying 
spatial process. 

In addition to modeling QWIs from the LEHD, the MSTM can be used to 
effectively address numerous statistical modeling and analysis problems in 
the context of multivariate spatio-temporal areal data. For example, besides 
analyzing high-dimensional data, the MSTM can also be used to model non- 
separable and nonstationary covariances, and to combine data from multiple 
repeated surveys. Although we mainly focus on modeling high-dimensional 
multivariate spatio-temporal areal data (e.g., QWIs from the LEHD), the 
MSTM is tremendously flexible and can be readily adapted to other settings. 

The remainder of this article is organized as follows. In Section 2 we in¬ 
troduce the LEHD-QWI data set and further describe the methodological 
challenges that we consider. Next, in Section 3 we provide mathematical 
foundations for the MSTM. Then, in Section 4 we introduce the multivari¬ 
ate spatio-temporal MI basis functions, the MI propagator matrix, and the 
parameter model for the covariance matrix of the random effects term. Sec¬ 
tion 5 provides an empirical study that is used to evaluate the effectiveness 
of the MSTM in recovering the unobserved latent process (“true” underly¬ 
ing values). Additionally, in Section 5 we use the MSTM to jointly analyze 
all 7,530,037 QWIs obtained from the US Census Bureau’s LEHD program. 
Finally, Section 6 contains discussion. For convenience of exposition, proofs 
of the technical results and details surrounding the MCMC algorithm are 
left to an Appendix. 

2. LEHD—Quarterly Workforce Indicators. The LEHD program pro¬ 
vides public access QWIs on several earnings variables for each quarter of 
the year over various geographies of the US (http://www.census.gov/). For a 
comprehensive description regarding the creation of QWIs, see Abowd et al. 
(2009). Here, we consider quarterly measures of average monthly income for 
individuals with steady jobs. A subset of this data set representing QWIs 
for 2970 US counties for women in the education industry during the third 
quarter of 2006 is displayed in Figure 1. However, the QWIs are much more 
extensive. Specifically, the quarterly average monthly income for individu¬ 
als that have a steady job is available over 92 quarters (ranging from 1990 
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Fig. 1. We present the QWI for quarterly average monthly income (US dollars) for 2970 
US counties, for women, for the education industry, and for the third quarter of 2006. The 
white areas indicate QWIs that are not made available by LEHD. 

to 2013), all of the 3145 US counties, by each gender, and by 20 differ¬ 
ent industries. This results in the aforementioned data set having 7,530,037 
observations—which we model jointly. 

The high-dimensional nature of QWIs and expansive coverage (e.g., quar¬ 
terly average monthly incomes) allows economists and other subject matter 
researchers to study differences in key US economic variables over many re¬ 
gions and times. Consequently, QWIs have had a significant impact on the 
economics literature; for example, see Davis et al. (2006), Thompson (2009), 
Dube, Lester and Reich (2013), Allegretto et al. (2013), among others. This 
demand for QWIs shows a clear need for developing statistical methodology 
that can be used to analyze such high-dimensional data sets. The current 
statistical approaches available cannot capitalize on the full utility of the 
QWIs. For example, Abowd, Schneider and Vilhuber (2013) limit the spa¬ 
tial and temporal scope of their analysis, which allows them to efficiently 
analyze only a portion of the QWIs. 

The complexity of the QWIs is further exacerbated by missing values; 
by “missing” we mean that the QWI is not provided by the LEHD pro¬ 
gram. Consider the (quarterly) average monthly income example, the total 
gender/industry/space/time combinations results in 2 x 20 x 3145 x 92 = 
11,573,600 possible QWIs. Hence, roughly 35% of the QWIs are missing. 
This leads to a total of 11,573,600 2 pairwise covariances that require mod¬ 
eling using random effects. Nevertheless, allowing for multivariate spatio- 
temporal covariances is extremely important from the perspective of pre¬ 
dicting (imputing) missing QWIs. 

As an example, in Figure 1, one might expect the quarterly average 
monthly income for men to be associated with the value for quarterly av¬ 
erage monthly income for women. Likewise, nearby observations in space 
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and time are often similar in value [Cressie and Wikle (2011)]. If no multi¬ 
variate spatio-temporal dependencies are present in the data, then one can 
not borrow strength among “similar” variables and “nearby” observations 
to improve the precision of the estimated QWIs. An exploratory analysis, 
based on the empirical covariance matrices computed from the log QWIs 
(not shown), indicates that the QWIs are indeed correlated across different 
variables, regions, and times. Consequently, this suggests that a statistical 
model that allows for multivariate spatio-temporal dependence can be effi¬ 
ciently utilized to predict (impute) QWIs. 

3. The multivariate spatio-temporal mixed effects model. The DSTM 
framework is a well-established modeling approach used to analyze data 
referenced over space and time. This approach is extremely flexible since 
it allows one to define how a group of spatial regions temporally evolve 
[e.g., see Cressie and Wikle (2011), page 13], as opposed to defining the 
temporal evolution of a process at each geographic region of interest. The 
MSTM represents a novel extension of the DSTM to the multivariate areal 
data setting, where we now allow groups of spatially referenced variables to 
evolve over time. Thus, in Sections 3.1 and 3.2 we introduce the MSTM in 
terms of the familiar “data model” and “process model” DSTM terminology 
[Cressie and Wikle (2011)]. 


3.1. The MSTM data model. The data model for the MSTM is defined 
as 


( 1 ) 


zV>(A) = Y®(A)+e<f>(Ay, 

£ = 1 ,...,L,t = T® 


rpiT) 

■Au 


A<E D 


W 

p,v 


where represents multivariate spatio-temporal areal data. The com¬ 

ponents of (1) are defined and elaborated as follows: 


1. The subscript “t” denotes discrete time, and the superscript “£” indexes 
different variables of interest (e.g., the QWI for women in the education 
industry). There are a total of L variables of interest (i.e., £ = 1,... ,L) 
and we allow for a different number of observed time points for each of 
the L variables of interest (i.e., for variable £, t = T^\ .. ., T^P ). 

2. We require T^ \ ■ ■ ■, T^P to be on the same temporal scale (e.g., quarterly) 
for each £, T^ < Ty\ min(T^) = 1, and max(T^) = T > 1. 

3. The set A represents a generic areal unit. For example, a given set A 
might represent a state, county, or a census tract. Denote the collection 
of all nf 1 2 3 observed areal units with the set Dq\ = {A^P :i = 1 ,..., nf ' 1 }; 
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£ = 1,L. The observed data locations are different from the predic¬ 
tion locations D^ t = : j = 1,..., }, that is, we consider predict¬ 

ing on a spatial support that may be different from {D^ t } (e.g., the 
counties with missing QWIs are not included in {Dq\}, but are included 
in {Dp\}). Additionally, denote the number of prediction locations at 

time t as Nt = and the total number of prediction locations as 

N = Y'J—t -W In a similar manner, the number of observed locations at 
time t and total number of observations are given by nt = n t^ an d 

n = Ylt=i n u respectively. 

4. The random process represents the ith. variable of interest at 

time t. For example, (•) might represent the quarterly average monthly 
income for women in the education industry at time t. The stochastic 
properties of {Y ^(•)} are defined in Section 3.2. Latent processes like 

(y t W (-)} have been used to incorporate spatio-temporal dependencies 
[e.g., see Cressie and Wikle (2011)], which we modify to the multivariate 
spatio-temporal areal data setting. 

5. It is assumed that e^(-) is a white-noise Gaussian process with mean 

zero and unknown variance varjej^(•)} = vf' > (•) for £= 1,...,L, and 
t = T^\ ... ,Ty \ The presence of (■)} in (1) allows us to take into 
account that we do not perfectly observe (y/^(-)}, and instead observe 
a noisy version In many settings, there is information that we 

can use to define {ef 1 (■)} (e.g., information provided by the statistical 
agency). If one does not account for this extra source of variability, then 
the total variability of the process (Ly '(•)} may be underestimated. For 
example, Finley et al. (2009) show that if one ignores white-noise error 
in a Gaussian linear model, then one underestimates the total variability 
of the latent process of interest. 


3.2. The MSTM process model. The process model for MSTM is defined 
as 


(2) 


y/ £) (A) = iif 1 (A) + S P(Ayrj t + rf £) (A); 


£ = l,...,L,t = T® 


rp{Q 

■■‘ 1 U 


AeD 


{() 

p ,t- 


In (2), Y t W (-) represents the Ith. spatial random process of interest at time 
t, which is modeled by three terms on the right-hand side of (2). The first 
term [i.e., '(•)}] is a fixed effect, which is unknown, and requires es¬ 

timation. We set /if } (-) =x^(-)73 t , where xf } is a known p-dimensional 
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vector of covariates and (3 t £ M p is an associated unknown parameter vec- 
tor; i = 1,..., L and t = 1,..., T. In general, we allow both x) ' and (3 t to 
change over time; however, in practice, one must assess whether or not this 
is appropriate for a given application. For the QWI example we specify x) 
and (3 t to be constant over time. 

The second term on the right-hand side of (2) [i.e., {sj^(-)'77 f }] repre¬ 
sents multivariate spatio-temporal dependencies. The r-dimensional vectors 
of multivariate spatio-temporal basis functions sj^(-) = (S^ (■),..., sj £ J (■))' 
are prespecified for each t = 1,..., T and i = 1,..., L, and in Section 4.1 we 
propose a new class of multivariate spatio-temporal basis functions to use 
in (2). The r-dimensional random vector r] t is assumed to follow a spatio- 
temporal VAR(l) model [Cressie and Wikle (2011), Chapter 7] 

(3) r/ t = +u*; t = 2,3,...,T, 

where for all t the r-dimensional random vector r] t is Gaussian with mean 
zero and has an unknown r x r covariance matrix IQ; M 4 is a r x r known 
propagator matrix (see discussion below); and Uf is an r-dimensional Gaus¬ 
sian random vector with mean zero and unknown r x r covariance matrix 
W; and is independent of rj t _ 1 . 

First order vector autoregressive models may offer more realistic struc¬ 
ture with regards to interactions across space and time. This is a feature 
that cannot be included in the alternative modeling approaches discussed 
in Section 1. Additionally, the (temporal) VAR(l) model has been shown 
to perform well (empirically) in terms of both estimation and prediction 
for federal data repeated over time [Jones (1980), Bell and Hillmer (1990), 
Feder (2001)]. 

The r-dimensional random vectors {r] t } are not only used to model tem¬ 
poral dependencies in {Y['^(-)}, but are also used to model multivariate 
dependencies. Notice that the random effect term rj t is common across all 
L processes. Allowing for a common random effect term between different 
processes is a straightforward way to induce dependence [Cressie and Wikle 
(2011), Chapter 7.4]. This strategy has been previously used in the univari¬ 
ate spatial and multivariate spatial settings [e.g., see Royle et al. (1999), 
Finley et al. (2009), and Banerjee et al. (2010)] and has been extended here. 

Finally, the third term on the right-hand side of (2) [i.e., (•)}] repre¬ 

sents fine-scale variability and is assumed to be Gaussian white noise with 
mean zero and unknown variance {cr| t }. In general, {£^(-)} represents the 

leftover variability not accounted for by {S^(-) / r 7 t }. One might consider 

modeling spatial covariances in {(■)}• Minor adjustments to our method¬ 
ology could be used to incorporate, for example, a CAR model [Banerjee, 
Carlin and Gelfand (2004), Chapter 3], tapered covariances [Cressie (1993), 
page 108], or block diagonal covariances [Stein (2014)] in (•)}• 
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4. Multivariate spatio-temporal mixed effects model specifications. Many 
specifications of the MSTM require methodological development before one 
directly can apply it to the QWIs. In particular, we need to specify the mul¬ 
tivariate spatio-temporal basis functions {Sf(-)}, the propagator matrices 
{M(}, and the parameter models for {K f } and {W(}. These contributions 
are detailed in Sections 4.1, 4.2, and 4.3, respectively. 


4.1. Moran’s I basis functions. In principle, the r-dimensional vector 
sj^(-) can belong to any class of spatial basis functions; however, we use 
Moran’s I (MI) basis functions, since they have many properties that are 
needed to accurately and efficiently model QWIs. In particular, the MI ba¬ 
sis functions can be used to model areal data in a reduced dimensional space 
(i.e., r<n). This feature allows for fast computation of the distribution of 
{rj t }, which can become computationally expensive for large r. This will be 
especially useful for analyzing the QWIs in Section 5.3, which consists of 
7,530,037 observations. Additionally, the MI basis functions allow for non- 
stationarity in space, which is a realistic property for modeling QWIs (see 
Section 2 for a discussion). 

A defining (and mathematically desirable) property of the MI basis func¬ 
tions is that they guarantee there are no issues with confounding between 
fixed and random effects. This property of removing any confounding frees us 
to consider inferential questions in addition to multivariate spatio-temporal 
prediction. For example, the QWIs can be used to investigate the degree 
of gender inequality in the US by comparing the mean (i.e., /if 1 ) average 
monthly income for men and women, respectively. 

Thus, to derive MI basis functions to use for QWIs, we extend this defining 
property to the multivariate spatio-temporal setting. Here, the derivation 
starts with the MI operator. Recall that the MI statistic is a measure of 
association, which equals to a weighted sums of squares where the weights 
are called the MI operator [see Hughes and Haran (2013)]. At time t the MI 
operator is explicitly defined as 


( 4 ) 


G(X t , A t ) = (I Nt - X i (X(Xi) _1 X()At(lAr t - X t (X{X t )- 1 X' 


th 


t = 1,... ,T, 


where the Nt x p matrix X* = (xj^(A) : i = 1,..., L, A € Dp\)', Ijy t is an 
Nt x Nt identity matrix, and A t is the Af x Af adjacency matrix correspond¬ 
ing to the edges formed by {D^ \ : l = 1,..., L}. Notice that the MI operator 
in (4) defines a column space that is orthogonal to Xj. This can be used 
to ensure nonconfounding between (3 t and r) t . Specifically, from the spectral 
representation G(X f , A t ) = x.GjAx.G.t&x G t-> we denote the N t x r real 
matrix formed from the first r columns of &x,G,t as S x,t- Additionally, we 
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set the row of S x,t that corresponds to variable £ and areal unit A equal to 
S^(A). Thus, by definition, for each t the Nt x p matrix of covariates Xt is 
linearly independent of the columns of the Nt x r matrix of basis functions 
S x,t and, hence, there are no issues with confounding between /3 t and rj t . 

It is important to emphasize that the orthogonalization of X* to obtain 
S x,t is done over the support of the entire spatial region (i.e., -Dp,t), which 
removes confounded random effects at any prediction location of interest. In 
principal, one might use an orthogonalization over a subset, say D C Dpt, 
and use a different class of basis functions to define sj^(-) at prediction 
locations outside D. However, in this case prediction locations outside D 
may suffer from problems with confounding and, hence, inference on the 
underlying mean l-if' 1 (;) may be incorrect. 

4.2. Moran’s I propagator matrix. The problem of confounding provides 
motivation for the definition of the MI basis functions {S^ ( (-)}. In a similar 
manner, the problem of confounding manifests in a spatio-temporal VAR(l) 
model and can be addressed through careful specification of {M f }. To see 
this, substitute (3) into (2) to obtain 

(5) y t = X t (3 t + + S x ,tu t + &; t = 2,..., T, 

where y t = (y/ £) (A) : i = 1,..., L, A e and = (ff } (A) : l = 1,..., L, 

A € Dp\)' are A^-dimensional latent random vectors. The specification of 
{Sjqt} using MI basis functions implies that there are no issues with con¬ 
founding between {/3 t } and {u f }; however, depending on our choice for 
{Mt}, there might be issues with confounding between r] t _ 1 and the (pTri¬ 
dimensional random vector = (j3' t , u£)'; t = 2,..., T [although the VAR(l) 
model assumes u t is independent of Then, rewriting (5), we get 

(6) S'x : t(yt ~ £t) = BrCt + Mtrjj.p t. = 2,...,T, 

where the r x (p + r) matrix Bf = (S ' r x ( Xf, I). The representation in (6) gives 
rise to what we call the MI propagator matrix, which is defined in an anal¬ 
ogous manner to the MI basis functions. Using the spectral representation 
of G(B(, I r ) = &G,B,tA-G,B,t^G B ti we se ^ the r x r real matrix equal to 
the first r columns of *&G,B,t f° r each t, which is denoted with M^t. 

Notice that there are no restrictions on {M^(} to mathematically guaran¬ 
tee that M B,t does not become “explosive” as t increases. Thus, one should 
investigate whether or not this is the case when using this model for “long- 
lead” forecasting. One should also be aware that we do not treat {M t } as 
an unknown parameter matrix to be estimated. Instead, we chose a specific 
form for {Mt}, namely, {M^}, that avoids confounding between {rj t } and 
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{£ t }. As a result, the final form of {M 5 J might not be spatially inter¬ 
pretable. This issue is addressed in Section 4.3, where constraints are added 
to the parameter model so that cov(r] t ) = t + Wf is spatially 

interpretable. Nevertheless, it is a huge advantage in spatio-temporal model¬ 
ing to have a known propagator matrix, as a prominent historical challenge 
with such models is addressing the curse of dimensionality in estimating 
realistic propagators [Cressie and Wikle (2011), Chapter 7]. 

4.3. Parameter models. Methods for analyzing high-dimensional data 
(like the QWIs) seek to remove ineffectual or redundant information [for 
a more in-depth discussion see Sun and Li (2012)]. In Sections 4.1 and 4.2 
we impose a reduced rank structure and a nonconfounding property and, as 
a result, remove information on high frequencies and confounded random ef¬ 
fects, respectively. Thus, we specify {K f } and {W;} in a manner that offsets 
these needed computational compromises. 

As an example, consider the case where we do not remove confounded ran¬ 
dom effects. Let P x,t = X f (X(X t ) - 1 X t and the column space of P x,t be de¬ 
noted as C(PA',t). Rewrite ( 2 ) and let S t = L x,t] and rj t = ( k' x ,0 fix.t) 1 

so that 

(7) y t = Xf/3f + H.y + L x,tSx,t + £*; t = 2,...,T. 

Here, the N t x h matrix Hx,t £ C(Px,t) _L , the N t x l matrix L x,t £ C{Px,t)i 
h and l are nonnegative integers, Kx,t is a /i-dimensional Gaussian random 
vector, and Sx,t is a Z-dimensional Gaussian random vector; t = 2,... ,T. 
The decomposition in (7) is the space-time analogue of the decomposition 
used for discussion in Reich, Hodges and Zadnik (2006) and Hughes and 
Haran (2013). The use of MI basis functions is equivalent to setting h equal 
to r, H x,t = Sx.i, and L x,t equal to a nt x l matrix of zeros for each t. As 
a result, the model based on MI basis functions ignores the variability due 
to because it is confounded with {/3 f }. In a similar manner, one can 

argue that both the reduced rank structure of the MI basis functions and 
the MI propagator matrix may also ignore other sources of variability. 

To address this concern, we consider specifying {Kf} as positive semi- 
definite matrices that are “close” to target precision matrices (denoted with 
Pf for t = 1,... , T) that do not ignore these sources of variability; critically, 
the use of a target precision matrix allows us to reduce the parameter space 
in a manner that respects the true variability of the process. Specifically, let 
Kf = <4-K t *(Pf), where cr 2 K > 0 is unknown and 

( 8 ) K f *(Pf) = argmin{||Pf-Sx. t C- 1 S^ t ||2}; t = l,...,T. 

c 

Here, || ■ ||p denotes the Frobenius norm. In ( 8 ), we minimize the Frobe- 
nius norm across the space of positive semi-definite matrices. A computable 
expression of K*(Pf) in ( 8 ) can be found in Appendix A. 
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Processes with precision P* do not ignore sources of variability like Sx,t in 

(7) , since P* has principal components in C(XQ and principal components 
associated with high frequencies. Hence, to mitigate the effect of removing 
certain principal components when defining S^(-), we specify the r x r 
matrix Kf to be as close as possible [in terms of the Frobenius norm in 

(8) ] to something that has these principal components, namely, the nt x nt 
matrix P^. That is, we rescale the total variability of our prior covariance to 
account for variability ignored for reasons of computation and confounding. 

There are many choices for the “target precision” matrices {P<} in (8). 
For example, one might use a CAR model and let P* = Qt, where recall 
Q t = I N t — A*; t = 1,... ,T. This allows one to incorporate neighborhood in¬ 
formation into the priors for {IQ}. In the case where the areal units are small 
and regularly spaced, one might consider one of the many spatio-temporal 
covariance functions that are available [e.g., see Gneiting (1999), Cressie 
and Huang (1999), and Stein (2005)]. Alternatively, an empirical Bayesian 
approach might be considered and an estimated precision (or covariance) 
matrix might be used [e.g., see Sampson and Guttorp (1992)]. 

The spatial-only case provides additional motivation for the approach 
in (8). That is, when T = L = 1 and Pi = Qi, the prior specification in (8) 
yields the MI prior introduced in Hughes and Haran (2013). This motivating 
special case is formally stated and shown in Appendix A. 

With both {K f } and {M t } specified we can solve for {W f }, that is, using 
the VAR(l) model 

(9) W t = K t -M B!t Kt- 1 M' Bit = a 2 K W* t -, t = 2,...,T. 

In (9), the r x r matrix W* = K* — t ; t = 2,..., T. It is impor¬ 

tant to note that the r x r matrices in the set {W*} may not be necessarily 
positive semi-definite. If W* is not positive semi-definite for some t, then we 
suggest using the best positive approximation. This is similar to “lifting” ad¬ 
justments suggested by Kang, Cressie and Shi (2010) in the spatio-temporal 
setting. 

The prior distributions for the remaining parameters are specified so that 
conjugacy can be used to obtain exact expressions for the full conditionals 
within a Gibbs sampling algorithm. Specifically, we choose a Gaussian dis¬ 
tribution for {/3 t } and inverse gamma (IG) for and {cr| f }. In many cases 

the statistical agency will provide values for {v^(A)} and, thus, no model 
is required for {v^\A)} in this setting. For our motivating QWI exam¬ 
ple, the LEHD program provides imputation variances for QWIs (http:// 
download.vrdc.Cornell.edu/qwipu.experimental/qwiv/betal/). Impu¬ 
tation variances for QWIs are not available for each county/quarter/industry/ 
gender combination, which is the multivariate spatio-temporal support of the 
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data in Section 2. Thus, we use an IG prior based on the imputation vari¬ 
ances that are available. See Appendix B for details regarding the MCMC 
algorithm, a complete summary of our statistical model, and a discussion 
on alternative model specifications for related settings. 

5. Analysis of quarterly workforce indicators using the MSTM. In this 
section we use the MSTM to analyze quarterly average monthly income. In 
particular, our analysis has two primary goals. The first goal is to demon¬ 
strate that the MSTM can reasonably reproduce latent multivariate spatio- 
temporal fields for the QWI setting. To do this, we perform an “empirical 
study.” Specifically, we perturb a subset of the log quarterly average monthly 
income (log QWIs), introduced in Section 2, then we test whether or not we 
can recover the log QWIs using the perturbed version. (Notice that the sym¬ 
metrizing log transformation is used so that the Gaussian assumptions from 
Section 3 are met.) An empirical study such as this differs from a traditional 
simulation study since the emphasis is on illustrating that the MSTM can 
reproduce values similar to quarterly average monthly income. Therefore, in 
Section 5.1 we introduce our empirical study design and in Section 5.2 we 
provide the results of our empirical study. 

Our second goal in this section is to establish that the MSTM can be effi¬ 
ciently used to jointly model high-dimensional areal data (see Section 2 for 
a discussion). The methodological development in Sections 3 and 4 are mo¬ 
tivated by striking a balance between modeling realistic multivariate spatio- 
temporal dependencies and allowing for the possibility of extremely high¬ 
dimensional data sets. As such, in Section 5.3 we jointly analyze all 7,530,037 
quarterly average monthly income estimates provided by the LEHD pro¬ 
gram. 

For Sections 5.1 through 5.3, the Gibbs sampler, provided in Appendix B, 
was run for 10,000 iterations with a burn-in of 1000 iterations. Convergence 
of the Markov chain Monte Carlo algorithm was assessed visually using trace 
plots of the sample chains, with no lack of convergence detected. Addition¬ 
ally, the batch means estimate of the Monte Carlo error (with batch size 50) 
[e.g., see Roberts (1996), Jones et al. (2006)] and the Gelman-Rubin diag¬ 
nostic (computed using three chains) [e.g., see Gelrnan and Rubin (1992)] 
did not suggest lack of convergence. 

5.1. Empirical study design. Abowd et al. (2009) provide a study to 
assess the quality of the QWIs. Thus, for consistency within the literature we 
adopt a study design similar to the one used in Section 5.7.2 of Abowd et al. 
(2009). Specifically, we restrict the data to t = 4,... ,55 (quarters between 
1991 and 2003), £ = 1,2 (which represents women and men in the education 
industry, respectively), and the prediction locations equal the counties in 
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Minnesota that have available QWIs (i.e., Dp\ = The scope of this 

empirical study is smaller than the entire data set introduced in Section 2, 
since in this section we are primarily interested in showing that the MSTM 
can recover latent multivariate spatio-temporal fields similar to the quarter 
average monthly income. See Section 5.3 for a demonstration of using the 
MSTM to efficiently jointly model the entire 7,530,037 QWIs. 

The perturbed version of the log quarterly average monthly income is 
explicitly written as 

(10) Rf ) {A) = zf ) (A)+6? (A)-, f = 4,...,55,^=l,2,7LG< ) Nit , 

where t is the set of counties in Minnesota (MN) that have available 

quarterly average monthly income estimates, {r[^(A)} represents the per¬ 
turbed version of the log quarterly average monthly income [log QWIs; de¬ 
noted by (■)}], and the set {ef 1 Q4) : t = 4,..., 55,1? = 1,2, A e Dq\} 
consists of i.i.d. normal random variables with mean zero and variance a\. 
In practice, the quarterly average monthly income estimates are publicly 
available and are, hence, observed. Nevertheless, for the purposes of this 
empirical study we will act as if the QWIs are an unobserved multivariate 
spatio-temporal held to be estimated, and treat { R'p } as the data process 
and as the latent process. 

We randomly select 65% of the areal units in t to be “observed,” 
which we denote with the set -D^ NOi . Thus, for this example, Dp t (given 

by -D^ n t ) and Do,t (given by 77^ N Q t ) are not the same. Recall from Sec¬ 
tion 2 that this choice reflects the amount of observed data present in the 
entire QWI data set, where 65% of the QWIs are observed. However, it 
is important to note that the “missing QWI” structure of the data set in 
Section 2 is different from what we use in this empirical study, since we do 
not incorporate missing QWIs patterns that occur due to a state’s failure 
to sign a MOU. Recall that if a state does not sign a MOU for a particular 
year, then the entire state is missing for that year. However, our choice to 
randomly select 65% of the areal units within 77^ N t to be “observed” is 
sufficient for our purposes. 

The value for the perturbation variance a“i is chosen relative to the vari¬ 
ability of the log quarterly average monthly income. The variance of the 
log quarterly average monthly income, within our study region, is given by 

var {Z^(A)} = 0.24. Thus, we specify the perturbations {e£\A) : A € Dq\ } 
to have variance = 0.24. This yields a signal-to-noise ratio of 1, which can 
be interpreted as a small signal-to-noise ratio. We argue that this choice is 
conservative, since small signal-to-noise ratios traditionally make prediction 
of a latent process difficult [Aldworth and Cressie (1999)]. 
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(a) Log QWI For Women (Edc. Industry, 4th Quart, of 1992) (b) Perturbed Log QWI For Women (Edc. Industry, 4th Quart, of 1992) 



(c) Predicted Log QWI For Women (Edc. Industry, 4th Quart, of 1992) 



Fig. 2. (a) Map of the log quarterly average monthly income for women in the education 

industry [i.e., {Z^ (A)}]. These values correspond to log quarterly average monthly income 
for women in the education industry, for counties in Minnesota, and for the fth quarter 
in 1992. For comparison, a map of the perturbed log quarterly average monthly income 
for women in the education industry [i.e., is given in (b). The white areas 

indicate missing regions. In (c) we provide the predictions {Z^\A)} that are computed 
using MSTM and the perturbed data {r[ ( \A)} from equation (10). 

We end this section with an example of analyzing a single realization of 
(A) : t = 4,..., 55, i = 1,2, A € Q f }. Consider the selected maps 
of the log quarterly average monthly income and the perturbed log average 
monthly income in Figure 2(a) and (b), respectively. Figure 2 visually depicts 
the difficulty of predicting a latent random field, as the number of “missing” 
QWIs is rather large and the signal-to-noise ratio is visibly small. 

To use the MSTM to predict {Zp } from {R^}, we need to specify the 
target precision matrix, the covariates, and the number of MI basis func¬ 
tions. Set the target precision matrix equal to {Q t} as previously described 
below (8). Let x.P (A) = 1, where g = 1,2 indexes men and women, respec¬ 
tively. Also, for illustration let r = 30, which is roughly 50% of the available 
MI basis functions at each time point t. In a sensitivity study (not shown), 
we see that the MSTM is relatively robust to changes to larger values of r. 
In general, for the purposes of prediction, large values of r are preferable; 
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(a) Log QWI For Men (Edc. Industry, 4th Quart. of 1992) (b) Perturbed Log QWI For Men (Edc. Industry, 4th Quort. of 1992) 
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<c) Predicted Log QWI For Men (Edc. Industry, 4th Quort. of 1992) 




Fig. 3. (a) Map of the log quarterly average monthly income for men in the education in¬ 

dustry [i.e., {Zg 2 \A)}]. These values correspond to log quarterly average monthly income 
for men in the education industry, for counties in Minnesota, and for the fth quarter in 
1992. For comparison, a map of the perturbed log quarterly average monthly income for 
men in the education industry [i.e., {Wg 2 \A)}[ is given in (b). The white areas indi¬ 
cate missing regions. In (c) we provide the predictions {Zg 2 \A)} that are computed using 
MSTM and the perturbed data {J?^(A)} from equation (10). 


however, a carefully selected reduced rank set of basis functions can produce 
as good or better predictions than those based on the full set of basis func¬ 
tions [Bradley, Cressie and Shi (2011, 2014, 2015)]. Using the MSTM with 

these specifications, we predict Z[ ' using the perturbed values R t . In Fig¬ 
ure 2(c) we present {Z^ (A) : A € Dq g}. In general, we let zf :> denote the 

MSTM predictions based on {Rf\A) : t = 4,..., 55, ^ = 1,2, A € -D^ N Q t }. 
Similar conclusions are drawn from Figure 3, which provides results for men. 

The performance of our predictions are further corroborated by the re¬ 
sults presented in Figure 4(a) and (b), where we map the percent relative 
difference (PRD) between the predicted log quarterly average monthly in¬ 
come and the actual log quarterly average monthly income. That is, the 
values plotted in Figure 4(a) and (b) are given by 


( 11 ) 



zjP(A)-zW(A) \ 

4\a) j 


x 100%; 


£=l,2,AeD® 
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Fig. 4. In (a) and (b), the percent relative difference (PRD) in (11) of the predicted log 
quarterly monthly average income of women and men within the education industry during 
the fourth quarter of 1992. 


Additionally, the median PRD across all variables, regions, and time points 
is 4.87%. Hence, for this example the difference between the predicted and 
actual log quarterly average monthly income is small relative to the scale of 
the log quarterly average monthly income. Thus, we appear to be efficiently 
reproducing the unobserved latent field (as measured by PRD) using the 
MSTM. 


5.2. Empirical study of multiple replicates. There have been no statis¬ 
tical methods used to obtain QWI estimates and measures of precision at 
missing regions. Thus, in this section we evaluate the performance of {} 
at both observed and missing regions over multiple replicates. 

The MSTM from Section 3 is currently the only stochastic modeling 
approach available to jointly model high-dimensional multivariate spatio- 
temporal areal data. Since there are no viable alternative methods available, 
we first assess the quality of the predictions relative to the scale of the data 
[e.g., see equation (11)]. Specifically, consider the median percent relative 
difference (MPRD) given by 


( 12 ) 


MPRD = median 


< abs 

\zM(A)-z«>(A)] 

[ Z?\A) \ 

l 


x 100 : 


t = 4,...,55,£ 


1,2 ,A£D 



If MPRD in (12) is “close” to zero for a given replicate of the held {R^(A) : 
t = 4,..., 55 ,£ = 1,2, A € -Dj^ n 0 t }, then the predictions are considered close 
(relative to the scale of the data) to the log quarterly average monthly in¬ 
come. In Figure 5(a) we provide boxplots [over 50 independent replicates of 
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(a) Boxplot of MPRD 


(b) Boxplot of stSPE 
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Fig. 5. (a) Boxplots of the MPRD in (12), using the 50 replicates of the spatial 

field {Rf l \A) : t = 4, ...,55,£ = 1,2, A £ D^ 0 0 t }, for observed and missing respec¬ 
tively. (b) Boxplots of the stSPE in (13), using the 50 replicates of the spatial field 
{. R (A) : t = 4,..., 55, i — 1,2, A £ D^o 0 t }, /or observed and missing respectively. In 
both (a) and (b) we do not plot outliers for the purposes of visualization. 

{R[^}] of MPRD evaluated at observed and missing regions, respectively. 
Here, we see that the MPRD is larger at missing regions as expected. How¬ 
ever, the values of the MPRD are consistently small for both observed and 
missing regions: the medians are given by 5.17% and 6.02% for observed 
and missing regions, respectively; and the interquartile ranges are given by 
0.6915 and 0.5470 for observed and missing regions, respectively. Thus, the 
MPRD shows that we are obtaining predictions that are close (relative to 
the scale of the log QWIs) to the log quarterly average monthly income. 

Another metric that one might use to validate our conclusions from Fig¬ 
ure 5(a) is the standardized squared prediction error (stSPE) 

stSPE = average! (Z t W (A) - Z t W (A)) 2 : 

(13) 

t = 4,..., 55, £= 1,2, A eD^j/a 2 . 

If stSPE in (13) is “close” to zero for a given replicate of the field {Rf\A) : 

t = 4,..., 55 ,l = 1,2, A € q t }, then the predictions are considered close 
to the log quarterly average monthly income. Also notice that the stSPE in 
(13) is normalized by ; consequently, we can compare the squared error 
of our predictions relative to the perturbation variances. This is especially 
noteworthy for predictions at missing regions, which have no signal in the 
original perturbed data set. 

In Figure 5(b) we provide boxplots [over 50 independent replicates of 
{R^ }] stSPE evaluated at observed and missing regions, respectively. Here, 
we see that the MSPE is larger at missing regions as expected. However, the 
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values of the stSPE at observed (missing) regions are consistently smaller 
(close) than 1: the medians are given by 0.8154 and 1.1293 for observed 
and missing regions, respectively; and the interquartile ranges are given by 
0.1994 and 0.1990 for observed and missing regions, respectively. Thus, the 
stSPE shows that the error in our predictions at observed (missing) regions 
are smaller than (similar to) the perturbation error (i.e., af). 

Notice that the stSPE is roughly 0.1293 above 1 at missing locations and 
0.1846 below 1 at observed locations; thus, the relative differences from 1 
are similar in the two situations. This may be problematic if there are more 
missing values than observed. However, note that this is not the case for the 
LEHD data set, which has roughly 65% of the prediction locations observed. 

5.3. Predicting quarterly average monthly income. We demonstrate the 
use of MSTM using a high-dimensional multivariate spatio-temporal data 
set made up of quarterly average monthly income obtained from the LEHD 
program. In particular, we consider all 7,530,037 observations introduced in 
Section 2. These values are available over the entire US, which we jointly 
analyze using the MSTM. We present a subset of this data set in Figure 6 (a) 
and (b). We see that the quarterly average monthly income is relatively 
constant across each county of the state of Missouri and that men tend to 
have higher quarterly average monthly income than women. This pattern is 
consistent across the different spatial locations, industries, and time points. 

The primary goals of our analysis in this section is to estimate the quar¬ 
terly average monthly income, investigate potential gender inequality in the 
US, and determine whether or not it is computationally feasible to use the 
MSTM for a data set of this size. Preliminary analyses indicate that the 
log quarterly average monthly income is roughly Gaussian. Since we assume 
that the underlying data is Gaussian, we treat the log of the average income 
as in ( 1 ). 

For illustration, we make the following specifications. Set the target pre¬ 
cision matrix equal to {Qt} as previously described below (8). Let xp (A) = 
(1, I(£ = 1),...,I(£ = 39), I(g = 1) x I{£ = 1),..., I(g = 1) x I{£ = 39))', where 
g = 1,2 indexes men and women, respectively, and recall /(•) is the indicator 
function. Also, following the MSTM specifications from our empirical study, 
we let r = 30, which is roughly 50% of the available MI basis functions at 
each time point t. Using the MSTM with these specifications, we predict 
L x T = 40 x 92 = 3680 different spatial fields. The CPU time required to 
compute these predictions is approximately 1.2 days, with all of our compu¬ 
tations performed in Matlab (Version 8.0) on a dual 10 core 2.8 GHz Intel 
Xeon E5-2680 v2 processor, with 256 GB of RAM. Of course, additional 
efforts in efficient programming may result in faster computing; however, 
these results indicate that it is computationally practical to use the MSTM 
to analyze massive data. 
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(a) Obs. Income For Women (Edc. Industry, 1st Quart, of 2013) (b) Obs. Income For Men (Edc. Industry, 1st Quart, of 2013) 
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(c) Predicted Income For Women (Edc. Industry, 1st Quart, of 2013) (d) Predicted Income For Men (Edc. Industry, 1st Quart, of 2013) 
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(f) Root MSPE For Women (Edc. Industry, 1st Quart, of 2013) 
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Fig. 6. (a) and (b) present the QWI for quarterly average monthly income (US dollars) 
for the state of Missouri, for each gender, for the education industry, and for the first 
quarter of 2013. LEHD does not provide estimates at every county in the US at every 
quarter; these counties are shaded white, (c)-(f) present the corresponding maps (for the 
state of Missouri, for each gender, for the education industry, and for quarter 92) of 
predicted monthly income (US dollars) and their respective posterior square root MSPE 
(on the log-scale). Notice that the color scales are different for women and men and that 
the root MSPEs are computed on the log-scale. White areas indicate missing regions. 
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Although we modeled the entire US simultaneously, for illustration, we 
present maps of predicted monthly income for the state of Missouri, for each 
gender, for the education industry, and for the 92-nd quarter [Figure 6(c) 
and (d)]. The prediction maps are essentially constant over the state of Mis¬ 
souri, where women tend to have a predicted monthly income of slightly less 
than 1200 dollars and men consistently have a predicted monthly income of 
about 1800 dollars. As observed in Figure 6(a) and (b), there is a clear pat¬ 
tern where men have higher predicted monthly income than women. These 
predictions appear reasonable since the maps of the root MSPE (on the log 
scale), in Figure 6(e) and (f), indicate we are obtaining precise predictions 
on the log-scale. Additionally, upon comparison of Figure 6(a) and (b) to 
Figure 6(c) and (d), we see that the predictions reflect the same general 
pattern in the data. These results are similar across the different states, 
industries, and time points. 

To further corroborate the patterns in the MSTM predictions, we fit a 
separate univariate spatial model from Hughes and Haran (2013). Specifi¬ 
cally, we fit the univariate spatial model from Hughes and Haran (2013) to 
the data in Figure 6(a) and (b) with r = 62 basis functions (100% of the 
available basis functions) and obtain the prediction maps (not shown). No¬ 
tably, the predictions are also fairly constant around 1200 and 1800 dollars. 
Moreover, the MSPE of the Hughes and Haran (2013) predictions (summed 
over all US counties) is 4.09 times larger than the MSPE of the predictions 
from the MSTM summed over all US counties. This may be due, in part, to 
the fact that the model in Hughes and Haran (2013) does not incorporate 
multivariate and serial (temporal) dependencies. 

The large difference in average monthly income between men and women 
can be further investigated by comparing the means [i.e., '(•)] for men 

and women, respectively. [Recall from Section 4.1 that we can perform in¬ 
ference on because we impose a nonconfounding property between 

jLt^(-) and Now, let mi 77120 indicate industry 1 through 20 

for men, and W\,... ,W 2 o for women. Then, for a given A consider the con¬ 
trast given by Ylt°= lMgiT^C^) — ^fc=i/ J 92 ^(^); which is interpreted as an 
average difference between the income of men and women over the 20 indus¬ 
tries. Hence, this contrast is a global (across industries) measure of income 
gender differences at the most current time point (notice t = 92). A positive 
(negative) value indicates that men (women) tend to have larger incomes. In 
Figure 7(a) and (b) we plot the posterior mean and variance of this contrast 
by state. Here, we see that for the first quarter of 2013, gender inequality 
is similar across each state (with men consistently having larger quarterly 
incomes), with the largest disparity occurring in Arizona. 

Figure 7(a) and (b) give a sense of the spatial patterns of the between- 
gender income differences for the first quarter of 2013. We can also inves¬ 
tigate the temporal and between-industry patterns in a similar manner. In 


22 


J. R. BRADLEY, S. H. HOLAN AND C. K. WIKLE 


(a) Total Log Average Monthly Income (b) Posterior variances 

by State (1st Quart. 2013) 
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Fig. 7. Plots of contrasts of (A) (referred to as the log average monthly income). 
In (a) and (b), we plot the posterior mean and variance of ^ p^ k \A) ~^2fJ-g2 k \A) by 
state, where the sum aggregates counties to states and is also indexed over the industries. 


particular, in Figure 8(a) we plot ')T /k A /x k mk \A) and A fi[ Wk \A) by quar¬ 
ter (i.e., t). Here, we see that the differences between the genders appears to 
be constant from 1990 to 2013. Likewise, in Figure 8(b) we identify between 
industry differences by plotting the posterior mean of A and 

St, A / i[ Wk \A ) by industry (i.e., k). Here, we observe that gender inequality 
appears present in each industry, with men consistently having larger mean 
average monthly income. That is, the posterior mean of ^fi[ nik \A) and 
the values within 95% (pointwise) credible intervals are larger than that for 
women. Furthermore, we see that the largest difference between log average 
monthly income occurs in the finance and insurance industries, which also 
appear to be the most lucrative industries for men. 

It should be noted that, despite the inherent computational issues, hav¬ 
ing an abundance of data has distinct advantages. For example, notice in 
Figure 6(b) that LEHD does not release data at two counties of Missouri 
for men in the education industry during quarter 92. Although these values 
are missing for this variable and time point, LEHD releases QWIs at these 
two counties (for men in the education industry) for 43 different quarters. 
Hence, with the observed values from 43 different spatial fields, we reduce 
the variability of predictions at the two missing counties during the 92nd 
quarter [compare Figure 6(b) to (f)]. This is particularly useful for the set¬ 
ting when a states does not sign a MOU and, hence, LEHD does not provide 
estimates here. 


6. Discussion. We have introduced fully Bayesian methodology to ana¬ 
lyze areal data sets with multivariate spatio-temporal dependencies. In par¬ 
ticular, we introduce the multivariate spatio-temporal mixed effects model 
(MSTM). To date, little has been proposed to model areal data that ex¬ 
hibit multivariate spatio-temporal dependencies. Furthermore, the available 
alternatives [see Carlin and Banerjee (2003) and Daniels, Zhou and Zou 
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(a) Total Mean Log Average Monthly Income by Quarter and Gender 



(b) Total Log Average Monthly Income by Industry dnd Gender 



Fig. 8. Plots of contrasts of (A) (referred to as the log average monthly income). 
In (a), we plot the posterior mean of ^2 k A ii( mk ' > (A) and ^2 k A fi[ Wk \A ) by quarter. In 

(b), we plot the posterior mean of^2 t A fj,[ rnk \A) and A fi[ Wk \A) by industry. In both 
(a) and (b) a 95% credible interval is given by horizontal line segments, and for compar¬ 
ison a line is superimposed connecting the intervals associated with males and females, 
respectively. 
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(2006)] do not allow for certain complexities in cross-covariances and fail 
to accommodate high-dimensional data sets. Hence, the MSTM provides an 
important addition to the multivariate spatio-temporal literature. 

The MSTM was motivated by the Longitudinal Employer-Household Dy¬ 
namics (LEHD) program’s quarterly workforce indicators (QWI) [Abowd 
et al. (2009)]. In particular, the QWIs are extremely high-dimensional and 
exhibit complex multivariate spatio-temporal dependencies. Thus, exten¬ 
sive methodological contributions, leading to the MSTM, were necessary in 
order to realistically, jointly model the QWIs’ complex multivariate spatio- 
temporal dependence structure and to allow for the possibility of remarkably 
high-dimensional areal data. 

We conducted an extensive empirical study to demonstrate that the MSTM 
works extremely well for predicting the QWI, quarterly average monthly in¬ 
come. Specifically, we perturb the log quarterly average monthly income, 
then predictions of the log quarterly average monthly income are made us¬ 
ing the perturbed values and comparisons are made between the predicted 
and the actual log quarterly average monthly income. The results illustrate 
that we are consistently recovering the unobserved latent field using the 
MSTM at both observed and missing regions. This is particularly notewor¬ 
thy, since there are no other methods that have been used to estimate QWIs 
at missing regions. In fact, because we borrow strength over different vari¬ 
ables, space, and time, we can also predict values for entire states when the 
values are missing for reasons of an unsigned MOU. 

The exceptional effectiveness of our approach is further illustrated through 
a joint analysis of all the available quarterly average monthly income esti¬ 
mates. This data set, comprised of 7,530,037 observations, is used to predict 
3680 different spatial fields consisting of all the counties in the US. The 
recorded CPU time for this example was 1.2 days, which clearly indicates 
that it is practical to use the MSTM in high-dimensional data contexts. 

In this article, we have found that incorporating different variables, space, 
and time into an analysis is beneficial for two reasons. First, one can lever¬ 
age information from nearby (in space and time) observations and related 
variables to improve predictions and, second, there are inferential questions 
that are unique to multivariate spatio-temporal processes. For example, in 
Section 5.3 it was of interest to determine where, when, and what industry 
had the largest disparity between the average quarterly income of men and 
women. Here, we found that these differences have been relatively constant 
over the last two decades, are currently fairly constant over each state, and 
are the highest within the finance and insurance industries. 

Although our emphasis was on analyzing QWIs, our modeling frame¬ 
work allows the MSTM to be applied to a wide array of data sets. For 
example, the MSTM employs a reduced rank approach to allow for massive 
multivariate spatio-temporal data sets. Additionally, the MSTM allows for 


MULTIVARIATE SPATIO-TEMPORAL MODELS 


25 


nonstationary and nonseparable multivariate spatio-temporal dependencies. 
This is achieved, in part, through a novel propagator matrix for a first- 
order vector autoregressive [VAR(l)] model, which we call the MI propaga¬ 
tor matrix. This propagator matrix is an extension of the MI basis function 
[Griffith (2000, 2002, 2004), Griffith and Tiefelsdorf (2007), Hughes and Ha- 
ran (2013), Porter, Wikle and Holan (2015)] from the spatial-only setting 
to the multivariate spatio-temporal setting. We motivate both the MI basis 
function and the MI propagator matrix as an approximation to a target 
precision matrix, that allows for both computationally efficient statistical 
inference and nonconfounding regression parameters. 

Our model specification also allows for knowledge of the underlying spa¬ 
tial process to be incorporated into the MSTM. Specifically, we propose an 
extension of the MI prior to the spatio-temporal case. This extension forces 
the covariance matrix of the random effect to be close (in Frobenius norm) to 
a “target precision” matrix, which can be chosen based on knowledge of the 
underlying spatial process. Importantly, this contribution has broader im¬ 
plications, in terms of reducing a parameter space, for defining informative 
parameter models for high-dimensional spatio-temporal processes. 

There are many opportunities for future research. For example, there are 
many QWIs available that are recorded as counts, which do not satisfy the 
Gaussian assumption even after a transformation. Thus, the MSTM could 
be extended to the Poisson data setting. The parameter model introduced in 
Section 4.3 is also of independent interest. In our applications, we let {Q t} be 
the target precision. However, one could conceive of various different “target 
precisions” built from deterministic models (e.g., for atmospheric variables). 
Another avenue for future research is to extend the MI propagator matrix, 
beyond the VAR(l) specification. In fact, this strategy could be easily used 
for many subject matter domains for other time series models. 

APPENDIX A: TECHNICAL RESULTS 

Proposition 1. Let 3?^ be a generic nxr real matrix such that *. = 
I r , C be a generic r x r positive definite matrix, P k be a generic nx n pos¬ 
itive definite matrix, and let k = 1 Then, the value of C that min¬ 
imizes l|Pfc ~ within the space of positive semi-definite 

covariances is given by 

(AT) C* = ^A + (^Jl^ P ^k 

where M + (R V ) is the best positive approximate [Higham (1988)] of a real 
square matrix R. Similarly, the value of C that minimizes J2k=i ||Pfc — 
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within the space of positive semi-definite covariances is given by 

K \ 


(A.2) 




K 


k =1 


Proof. By definition of the Frobenius norm, 

K 

]T||p fc - 

k =1 

K 

= ]Ttrace{(P fc - ^Cr 1 ^)'^ - ^CT 1 *'*)} 

k =1 


K 


(A.3) 


y^jtrace(PfcPfc) — 2 x trace(^^.P^fcC 1 )+trace(C 2 )} 
k =1 

y^trace(PfcPfc) - K x trace | ^ ^ $' fc P fc $ fe ^ | 
c ~ 1 -^£*'* p *** 


+ K x 


fc=i 


It follows from Theorem 2.1 of Higham (1988) that the minimum of (A.3) is 
given by equation (A.l) in the main document. In a similar manner, if one 
substitutes C for C” 1 in (A.3), then we obtain the result in equation (A.2) 
in the main document. □ 


Proposition 2. Let Sv.i be the MI propagator matrix and C be a 
generic r x r positive definite matrix. Then, the value of C that minimizes 
||Qi — Sx,iCS( Y within the space of positive semi-definite covariances is 
given by 

(A.4) C* = AI + (S: y>1 QiS x ,i). 

Proof. The proof of Proposition 2 follows immediately from Proposi¬ 
tion 1. Specifically, let K = 1, = Sx,i, and Pi = Qi. Then, apply Propo¬ 

sition 1. If S' x 1 QiSx,i is positive definite, then (17) leads to the prior spec¬ 
ification in Hughes and Haran (2013). Porter, Holan and Wikle (2015) show 
that S^QiSx.i is positive definite as long as an intercept is included in 
the definition of Xi. □ 
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APPENDIX B: FULL CONDITIONAL DISTRIBUTIONS 
The model that we use for multivariate spatio-temporal data is given by 
Data model: zj C) (A)\f3 t ,r) t ,^ () (■) 

~ d Normal (x^ (A)'/3 t + S x] t (A)'ri t + ^ (A)); 
Process model 1 : ~ Gaussian(Ms,t» 7 j_ 1 , Wt); 

Process model 2: 77 -JK 1 ~ Gaussian(0, Ki); 

Process model 3: Normal(0, cr| )t ); 

Parameter model 1: 5^ 10 ( 0 :^, {3 V '), 

Parameter model 2: j3 t ~ Gaussian(/Xg, cr|l p ); 

Parameter model 3: a 2 t n-j IG (a^,/3^); 

Parameter model 4: (t|- ~ IG(a^, /3#); 

1= 1,... ,L,t = TP,.. .,TjP,k = 1,2, A € D®, 

where cr| > 0, > 0, > 0, ax > 0, /3 V > 0, /% > 0, and fix > 0. In Sec¬ 

tions 5 and 6 the prior mean of is set equal to a p-dimensional zero vector, 
and the corresponding variance <x| is set equal to 10 15 so that the prior on 
{/3 t } is vague. In Sections 5 and 6 we also specify a^, ax, f%, /?£, and (3x 
so that the prior distributions of <r| t and a 2 K t are vague. Specifically, we let 
a £ = ax = 2, and (3 V = f3^ = fix = 1; here, the IG(2,1) prior is interpreted 
as vague since it has infinite variance. 

We now specify the full conditional distributions for the process variables 


i.e., {rj t } and (•)}] and the parameters [i.e., {v) r> (•)}, £ }, and 


Wi 


a 


Ki 


Full conditionals for process variables. Let the rt/-dimensional random 
vectors z t = (Z t W (A) : l = 1,..., L, A e D^J, £ t = (£ t W (A) : £ = 1,..., L, A € 

Dq\)', and the nt x p matrix X t = (xj^(A) : i = 1,... ,L, A € Dq\)'; t = 
1,..., T. Then, we update the full conditional for r] l:T = (rj' t : t = 1,..., T)' 
at each iteration of the Gibbs sampler using the Kalman smoother. We ac¬ 
complish this by performing the following steps: 

1. Find the Kalman filter using the shifted measurements {z \ : zj = z* — 
Xt/3 t — £ t } [Shumway and Stoffer (2006), Carter and Kohn (1994), Friihwirth- 
Schnatter (1994), Cressie and Wikle (2011)]. That is, for t = l,...,T com¬ 
pute 

(a) T 7 |jJ = F(T 7 t | 2 1:i , 0 j j] ), 
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( c ) p[|] = cov(?7 f |ii :t ,0b ] ), 

where P^ = (<t^) 2 K* and d\^ represents the jth MCMC draw of 6 t and 
a 2 K , respectively. 

2. Sample ~ Gaussian^bL, pf 2 ^). 

3. For t = T — 1,T — sample 

rf +1] ~ Gaussian^ + ~ *7*%)^ ~ 

where J? 1 = PgM'(P^)- 1 . 

Notice that within each MCMC iteraction we need to compute the Kalman 
filter and Kalman smoothing equations. This adds more motivation for re¬ 
duced rank modeling, that is, if r is large (i.e., if r is close in value to n), 
this step is not computationally feasible. 

The full conditional for the remaining process variable (£^(-)} can also be 
computed efficiently [Ravishanker and Dey (2002)]. The full conditional for 
{z!t ] (•)} is gi yen by £ t ~ Gaussian(/i| t , E| t ), where t = (V^T 1 + a^ 2 l Nt )~ 1 , 
Hit = X V) -1 x (z t - X0 t - S tVt), V t = diag {v[ i] (A) : £ = 1,..., L, A e 
D$ t ), and S t = (Sf } (A) : l = 1,..., L, A G t = 1,... ,T. 

Full conditionals for the parameters. Similar to the full conditional for 
[Ravishanker and Dey (2002)], we also have the following full condi¬ 
tional for (3 t : (3 t ~ Gaussian(/u^ t , where = (X)V)" 1 X i -(-(j^ 2 Ip) _1 , 
and H*pt = x ( z t — — S^Q; £ = 1,... ,T. The exact form of 

the full conditionals for <j 2 k and {<r 2 f } can also be found in a straight¬ 
forward manner. It follows that the full conditionals for a 2 K and cr| t are 

IG(Tr /2 + 2,1 + ViKr 1 r ?1 /2 + £f= 2 fai - 1 - M f r ?i _’ 1 )/2) 

and IG(n/2 + 2, l + £j£ t /2) (for t = 1,... ,T), respectively. 

Imputation variances for QWIs are not currently available for each county/ 
quarter/industry/gender combination, which is the multivariate spatio- 
temporal support of the data in Section 2. Thus, we specify a prior distribu¬ 
tion for {v^\A)} that capitalizes on the available information, namely, im¬ 
putation variances defined for QWIs given at each county/quarter/industry 
combination. Denote these imputation variances with where m = 
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1,..., 20 and t = 1,..., 92. This leads us to our prior for 



(T)} given by 


v[ l \A) 


vf ] (A)<5 (1) / exp{2zf } (A)}, 

if ^ = 1,... ,20, 

~(<?- 20) ( A } S (2)/ exp {2zf } (A)}, 

if £ = 21,..., 40; t = 1,..., 92, A e D { § v 


where 5^ > 0 for k = 1,2, and we let £ = 1 ,..., 20 indicate men in each of the 
20 industries and £ = 21,..., 40 indicate women in each of the 20 industries, 
respectively. We divide by exp{2Z) ; (A)} to transform v t ; to the log-scale; 
specifically, we use the delta method [see Oehlert (1992), among others] to 
transform the variances to the log-scale. Thus, our model for the variances 
{v^(A)} is a simple reweighting (by weights in {5^}) of the imputation 
variances (on the log-scale) obtained from the LEHD program. We note that 
our predictions are relatively robust to this specification. 

In the empirical study in Sections 5.1 and 5.2, we use the known value of 


vf^ (A) and, hence, no distribution was placed on <5^ and d ^ 2 ' 1 . In many cases 
this is reasonable since the statistical agency provides values for v^(A). In 
Section 5.3 we let 5^ ~IG(1,2); k = 1,2. Now, let £ = 1,..., 20 indicate 
the spatial fields corresponding to each of the 20 industries for men and 
£ = 21,... ,40 indicate the spatial fields corresponding to each of the 20 in¬ 
dustries for women. The full conditionals for d^ 1 - 1 and 5^ are IG(M/2 + 2, 1 + 


IG(F/2 + 2. 1 + E*21 EEiE. cd m (Z< 1 "(T - xrum - sZWm - 




M, 


>(*) 


J A& D Q t 

(A)), where M = l n ? and F = Ef= 2 i Hth n ? ■ 

In some settings, survey error variances are not provided. The case of un¬ 
known survey variance leads to interesting and difficult modeling questions. 
In particular, when var(ej^) = v^\-) is unknown, there may be issues with 
identifiability between <r| t and v^(-) when v^\-) is roughly constant across 
variables and locations [see Bradley, Cressie and Shi (2015), for a discussion]. 
To avoid this issue of identifiability, one might combine £(•) and ej^(-), and 
then estimate the sums £(•) + e^(-) and vp (•) + cr| t , respectively. In the 
environmental context, others have addressed this identifiability problem by 
avoiding the use of likelihoods and adopting a moment-based approach to 
estimate v t ; (-); specifically, see Kang, Cressie and Shi (2010) and Katzfuss 
and Cressie (2012) for the definition of a variogram-extrapolation technique 

(f)\ 

to estimate v\ (■) and Kang, Cressie and Shi (2010) for a method of mo¬ 
ments estimator. 
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